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© Voice synthesizing device. 



© A voice synthesizing device which compiles 
wave segments such as pitch wave segments in 
speech to synthesize speech, which device com- 
prises a connection type memory for storing a con- 
nection type expressing the connection state of each 
wave segment for that point of the wave segment 
which connects with another wave segment; and a 
wave segment connector which, when the wave seg- 
ments are connected, connects an end sampling 
point and a lead sampling point of the wave segment 
with a normal sampling period, or with a normal 
sampling period compressed or expanded by only 
1/2 of the sampling period according to the connec- 
^tion type stored in the connection type memory. 
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VOICE SYNTHESIZING DEVICE 



The present invention relates to a voice syn- 
thesizing device which compiles wave segments 
such as pitch wave segments and quasi-voice 
wave segments to reproduce a voice wave. 

It is well known that of the different voice 
waves, the waves of voiced sounds such as vowels 
have a redundant pitch structure in which essen- 
tially the same wave is repeated from several to 
dozen times within a cycle of from 2 or 3 ms to 10 
ms. Conventionally, voice synthesizers have em- 
ployed a phoneme segment compiling method us- 
ing the above pitch structure to generate a syn- 
thesized voice. Voice synthesizers of this type re- 
peat and connect pitch wave segments or quasi- 
voice wave segments for a predetermined period to 
synthesize a voice wave. This serves to reduce the 
amount of wave segment data for said pitch wave 
segments or quasi-voice wave segments, and 
maintains high quality in the eventually synthesized 
voice. 

However, because a conventional voice syn- 
thesizer using the segment compiling method as 
described above synthesizes a voice wave by sim- 
ply repeating and connecting pitch wave segments 
or voice wave segments based on said pitch wave 
segments for a predetermined period, distortion 
arises where said pitch wave segments or. quasi- 
voice wave segments are connected as described 
below. 

Fig. 4 shows an example of the pitch wave 
segment used in voice waveform synthesis. Each 
double circle in Fig. 4 shows the sampled value at 
every sampling time (hereafter referred to as a 
sampled value); the solid lines drawn perpendicular 
to the time axis from these points represent the 
sampling time; the dotted lines drawn perpendicu- 
lar to the time axis between these sampling points 
represent the interpolated sampling time at which 
said sampled value is interpolated to output the 
interpolated value during the waveform synthesis. 
The pitch wave segment shown in Fig. 4 may be of 
one of the following four wave types depending on 
the position at which the wave crosses the zero 
point 

Specifically, the sampling time period Ts is 
divided into two phases, the first referred to as P1 
and the later as P2. Thus, in wave type (1) shown 
in Fig. 4(a), zero cross point m for the interpolated 
waveform of top sampled value of the pitch seg- 
ment ae falls within the range P2, and the zero 
cross point o for the interpolated waveform of end 
sampled value of the pitch segment falls within the 
range P2. In wave type (2) shown in Fig. 4(b), the 
zero cross point for the interpolated waveform of 
top sampled value of the pitch segment falls within 



the range P1, and the zero cross point for the 
interpolated waveform of end sampled value of the 
pitch segment falls within the range P1. In wave 
type (3) shown in Fig. 4(c), the zero cross point for 
s the interpolated waveform of top sampled value of 
the pitch segment falls within the range P2, and the 
zero cross point for the interpolated waveform of 
end sampled value of the pitch segment falls within 
the range P1. In wave type (4) shown in Fig. 4(d), 
ro the zero cross point for the interpolated waveform 
of top sampled value of the pitch segment falls 
within the range P1, and the zero cross point for 
the interpolated waveform of end sampled value of 
the pitch segment falls within the range P2. Thus, if 
is pitch wave segments of each of the types pre- 
viously described are simply repeated and con- 
nected, the pitch cycle where the segments are 
connected will be shifted in phase by a quantity 
equal to half the sampling period, resulting in dis- 

20 tortion which differs from the original wave. 

In other words, if, for example, like waves of 
type (3) are simply connected, the phase of the 
resulting wave will be delayed by one-half sam- 
pling cycle as shown in Fig. 5(b). Furthermore, if 

25 like waves of type (4) are simply connected, the 
phase of the resulting wave will be advanced by 
one-half sampling cycle as shown in Fig. 5(c). In 
this event, interference will occur in the rise of the 
pitch wave segment, and the sound quality of the 

30 eventually synthesized voice will significantly dete- 
riorate. The deterioration in sound quality is par- 
ticularly severe when the pitch period is short (i.e., 
the pitch frequency is high) as in female voices. 
In order to solve the above discussed problem, 

35 there is two methods. According to one method, 
one pitch wave segment is cut out, temporarily 
converted to a frequency axis wave by fast Fourier 
transformation (FFT) analysis, and reconverted to a 
time axis wave by reverse FFT after phase adjust- 

40 ment so that both ends of the pitch wave segment 
can approach zero. According to the other method, 
an impulse response wave is reproduced by linear 
predictive coding (LPC) of the one pitch wave 
which has been cut out, and this impulse response 

45 wave is used as the pitch wave segment. However, 
in the above methods, the ends of the pitch wave 
segment are not sufficiently close to zero and 
. distortion thus remains in the pitch wave segment, 
resulting in variations in the tone. 

so Therefore, it is an object of the present inven- 

tion to provide a voice synthesizing device which is 
effective to produce a synthetic voice with no 
sound quality distortion through a simple process 
to connect the wave segments. 

In order to achieve the aforementioned objec- 
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tive, a voice synthesizing device " of the present 
invention for compiling wave segments such as 
pitch wave segments in speech to synthesize 
speech is characterized by the provision of a con- 
nection type memory for storing a connection type 
descriptive of the connection state of that- point 
where said wave segments are connected; and a 
wave segment connector which, when said wave 
segments are connected, connects the end sam- 
pling point and the lead sampling point of the wave 
segments with a conventional sampling period, or 
with a conventional sampling period compressed or 
expanded by only 1/2 of the sampling period ac- 
cording to the connection type stored in said con- 
nection type memory. 

Thus, when voice wave segments are compiled 
to synthesize a voice, the connection type stored in 
the connection type memory is referenced. Accord- 
ing to the referenced connection type, the end and 
leading sampling points of 'the wave segments are 
connected with a conventional sampling period, or 
with a conventional sampling period compressed or 
expanded by only 1/2 of the sampling period so 
that said wave segments are connected smoothly 
to provide a synthesized voice wave. 

The invention is further described in detail in 
connection with the drawings in which 

Fig. 1 is a block diagram of a preferred 
embodiment of a voice synthesizing device accord- 
ing to the present invention; 

Fig. 2 is a diagram showing the format of 
storage of pitch wave segment data in a read-only 
memory (ROM); 

Fig. 3 is a flow chart showing the sequence 
of operation for the voice synthesizing operation; 

Fig. 4 is a descriptive drawing of the wave 

types; 

Fig. 5 is an explanatory diagram showing the 
wave types and their connection methods; 

Fig. 6 is an explanatory diagram showing 
wave types according to an alternative embodiment 
of the present invention; and 

Fig. 7 is an explanatory diagram showing the 
wave types and their connection methods accord- 
ing to the alternative embodiment of the present 
invention. 

A first preferred embodiment of the present 
invention will now be described with reference to 
Fig. 1 which shows a block diagram of a voice 
synthesizing device according to the present inven- 
tion. 

Reference number 1 is a control ROM (read- 
only memory) which stores a control program used 
by CPU (central processing unit) 5 for voice syn- 
thesis; reference numeral 2 is a RAM (random 
access memory) used as a work memory during 
voice synthesis; reference numeral 3 is a data 
ROM used to store voice coding data; reference 



numeral 4 is an I/O interface through which 
input/output signals pass at the start of voice syn- 
thesis and other processes; reference numeral 6 is 
a D/A converter used for digital-to-analog conver- 

5 sion of voice wave data synthesized under the 
control of CPU 5; and reference numeral 7 is an 
amplifier which amplifies an input analog voice 
wave and outputs it to a loudspeaker 8. 

The control ROM 1, RAM 2, data ROM 3, I/O 

w interface 4, CPU 5, and D/A convenor 6, all used in 
the voice synthesizing device of the above con- 
struction can be integrated together on a single 
chip, and it is also possible to employ an external 
data ROM 9 for storing voice coding data for sys- 

15 terns expansion. 

When a start signal necessary to initiate the 
voice synthesis is input to a voice synthesizing 
device of the above construction from an external 
source through I/O interface 4, CPU 5 begins the 

20 voice synthesizing operation based on the control 
program stored in the control ROM 1. Thus, a 
voice synthesis wave data is generated by CPU 5 
based on the voice coding data stored in the data 
ROM 3. The generated voice Synthesis wave data 

25 is converted to an analog signal by D/A convertor 
6, then amplified by amplifier 7 and is finally out- 
putted as a synthesized voice from the loudspeaker 
8. 

As described below, the voice synthesizing de- 

30 vice according to the present invention generates a 
synthesized voice free of distortion in the pitch 
wave rise by connecting wave segments such as 
pitch wave segments or quasi-voice wave seg- 
ments to generate the synthesized voice. 

35 According to a first method as shown in Fig. 5- 

(a), when the time axis zero cross point of the 
interpolated waveform for end sampled value of the 
preceding pitch wave segment and the time axis 
zero cross point of the interpolated waveform for 

40 top sampled value of the following pitch wave seg- 
ment are both within the range P2 when the waves 
are connected due to the connection of similar 
waves of type (1) or of dissimilar waves of waves 
of type (1) and type (3) as shown in Fig. 4, and 

45 when the time axis zero cross point of the interpo- 
lated waveform for end sampled value of the pre- 
ceding pitch wave segment and the time axis zero 
cross point of the interpolated waveform for top 
sampled value of the following pitch wave segment 

50 are both within the range P1 when the waves are 
connected due to the connection of similar waves 
of wave type (2) or dissimilar waves of wave type 
(2) and type (4), end sampled value and top sam- 
pled vafue of the pitch wave segments are output 

55 at the conventional sampling point and the pitch 
wave segments are connected. Then, the interpo- 
lated values between the end sampled value and 
the top sampled value indicated by a solid triangle) 
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are computed at a point equal to 1/2 sampling 
interval Ts and are outputted so that the two pitch 
wave segments can be connected smoothly. 
Hereinafter the connection of such pitch wave seg- 
ments as just described shall be referred to as 
connection type Oa. 

As shown in Fig. 5(b). when the time axis zero 
cross point of the interpolated waveform for the 
end sampled value of the preceding pitch wave 
segment is within the range P1 and the time axis 
zero cross point of the interpolated waveform for 
the top sampled value of the following pitch wave 
segment is within the range P2 when the waves are 
connected due to the connection of dissimilar 
waves of type (2) and type (1 ) or waves of type (2) 
and type (3), the wave segments are not connected 
at the conventional sampling point; the conven- 
tional sampling interval between the end and top 
sampled values is compressed by one-half and is 
then outputted to connect the pitch wave seg- 
ments. Hereinafter the connection of such pitch 
wave segments as just described will be referred to 
as connection type 1a. 

As shown in Fig. 5(c), when the time axis zero 
cross point of the interpolated waveform for the 
end sampled value of the preceding pitch wave 
segment is within the range P2 and the time axis 
zero cross point of the interpolated waveform for 
the top sampled value of the following pitch wave 
segment is within the range P1 when the waves are 
connected due to the connection of dissimilar 
waves of type (1) and type (2) or of waves of type 
(1) and type (4), the wave segments are not con- 
nected at the conventional sampling point; the con- 
ventional sampling interval between the end and 
top sampled values is expanded by one-half and is 
then outputted to connect the pitch wave seg- 
ments. The period between the end and top sam- 
pled values of the pitch wave segments is interpo- 
lated as follows. 

Specifically, assuming the end sampled value 
of the preceding pitch wave segment is 6x16 and 
the top sampled value of the following pitch wave 
segment is ox2o, if 6x1o>ox26, the interpolated 
value x1/2 is computed following the end sampled 
value 5x1 o (specifically, the higher peak value), 
and is then outputted at intervals of Ts/2. Next, the 
period between this interpolated value x1/2 and the 
top sampled value 5x25 (specifically, the lower 
peak value) is interpolated and is then outputted. 
Hereinafter the connection of such pitch wave seg- 
ments as just described shall be referred to as 
connection type 2-(a). Furthermore, if ox1o<ox2o, 
the interpolated value x2/2 of the prior top sampled 
value ox26 is computed and is then outputted at 
intervals of Ts/2. Next, the period between this 
interpolated value x2/2 and the top sampled value 
0x16 (specifically, the lower peak value) is interpo- 
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lated and is then outputted. Hereinafter the connec- 
tion of such pitch wave segments as just described 
shall be referred to as connection type 2-(b). 

According to a second method, sampling is 
s performed at a cycle twice (twice the frequency) 
that defined by the Nyquist theorem. Whether at an 
even-numbered sampling point or an odd-num- 
bered sampling point, the sampling data used for 
voice synthesis is resampled at the standard 
w Nyquist theorem cycle from the sampling point 
which is nearest the pitch segment rise. This wave 
is illustrated in Fig. 6. Here, the even-numbered 
sampling points are the sampling points (those 
shown by a solid line in Fig. 6) occurring in the 

75 Nyquist theorem cycle, and the odd-numbered 
sampling points (those shown by a dotted line in 
Fig. 6) are the sampling points occurring between 
even-numbered sampling points. In this case, sam- 
pling data obtained at the sampling points indicated 

20 by a double circle are the sampled values (which 
are hereinafter referred to as object samples) which 
will be the object of voice synthesis. These seg- 
ments may be either wave type (1) or type (2). 

As shown in Fig. 7(a), when the time axis zero 

25 cross point of the interpolated waveform for the 
end sampled value which will be the object of voice 
synthesis for the preceding pitch wave segment 
(hereinafter referred to as end object sample) and 
the time axis zero cross point of the interpolated 

30 waveform for the leading object sample of the 
following pitch wave segment are both within the 
range P2 due to the connection of similar waves of 
type (5) or dissimilar waves of type (5) and type 
(6), the end object peak which is the object of 

35 voice synthesis and the leading object sample are 
outputted at the sampling point which will be the 
object of voice synthesis to connect the pitch wave 
segments. Then, at the half point of the object 
sampling period, the end sampled value q of the 

40 preceding pitch wave segment is outputted as the 
interpolated value so that the two pitch wave seg- 
ments can be connected smoothly. Hereinafter, 
connection of such pitch wave segments will be 
referred to as connection type Ob. 

45 As shown in Fig. 7(b), when the time axis zero 

cross point of the interpolated waveform for the 
end object sample of the preceding pitch wave 
segment is within the range P1 and the time axis 
zero cross point of the interpolated waveform for 

so the leading object sample of the following pitch 
wave segment is within the range P2 due to the 
connection of similar waves of type (6) or dissimilar 
waves of type (6) and type (5), the pitch wave 
segments are not connected at the sampling point 

55 which is the object of voice synthesis; the period 
between the end object sample and the leading 
object sample of the pitch wave segments is com- 
pressed by one-half and is then outputted to con- 
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nect the pitch wave segments. Hereinafter, connec- 
tion of such pitch wave segments will be referred 
to as connection type lb. 

Fig. 2 shows one example of the data format 
when, for example, the pitch wave segments are 
analyzed and the resulting pitch wave segment 
data is stored in ROM 3 (see Fig. 1). The illustrated 
data format is comprised of encoding data of mul- 
tiple pitch wave segments* each of said encoding 
data for each pitch wave segment including inter- 
polation data and voice data. The interpolation data 
consists of end segment data 1 1 identifying wheth- 
er the pitch wave segment is the last pitch wave 
segment or not, encoding method data 12 identify- 
ing the method used to encode the sampled data 
of the pitch wave segment, repeat number data 13 
telling how many times the pitch wave segment 
was repeated, connection type data 14, as shown 
in Fig. 5 and Fig. 7, for use when the same pitch 
wave segment is repeated, and connection type 
data 15 (hereinafter referred to as a next pitch 
wave segment connection type) for when the given 
pitch wave segment is connected to the next adja- 
cent pitch wave segment. The voice data includes 
a sample number data 16 specifying the number of 
encoded datum included in the pitch wave seg- 
ment, and a series of multiple encoded data 17 to 
19 for~each sampling point used in voice synthesis. 
This encoded data is stored as a bit string accord- 
ing to the encoding method (e.g., pulse encode 
modulation (PCM) or adaptive differential pulse en- 
code modulation (ADPCM)) stored in the encoding 
method data 1 2 for the interpolation data. 

Referring now to the flow chart of Fig. 3, the 
voice synthesizing operation whereby pitch wave 
segments which are wave segments are connected 
and a voice is synthesized by the methods 1 and 2 
described above will be described in detail below. 

At step S1, 1 byte of interpolation data is read 
from the pitch wave segment data stored in the 
data ROM 3 according to the format shown in Fig. 
2, and the byte is divided into the end segment 
data 11, the encoding method data 12, the repeat 
number data 13. the connection type data 14, and 
the next pitch wave segment connection type 15. 
Based on the obtained information, the end seg- 
ment data flag, encoding method flag, repeat coun- 
ter, repeat connection type, and next pitch wave 
segment connection type are each set in RAM 2. 
RAM 2 has an area for storing the repeat connec- 
tion type for wave segment connection and a pitch 
wave segment connection type for wave segment 
connection, and the repeat connection type of the 
preceding pitch wave segment data and the next 
pitch wave segment connection type are both set 
therein. 

At step S2, sample number data 16 specifying 
the encoded datum number of one pitch wave 



segment is read from the data ROM 3, and this 
number is set in RAM 2 as the sample number 
count. 

At step S3, the first coded datum is read from 
5 data ROM 3. 

At step S4, the first coded datum is decoded 
according to the encoding method set in the en- 
coding method flag of RAM 2, and the top sampled 
value of the pitch wave segment is computed. The 

w interpolated value of the period between this top 
sampled value and the following sampled value 
(based on the second encoded datum) is then 
computed. Next, the interpolation processing re- 
quired for connection with the preceding pitch 

15 wave segment is then executed according to the 
next pitch wave segment connection type of the 
preceding pitch wave segment data set in the 
repeat connection type for pitch wave segments in 
RAM 2. Furthermore, the timing of the output of the 

?o computed top sampled value to the D/A convenor 
6 (if connection type Oa and Ob, the normal timing 
is outputted; if connection type 1a and 1b, the 
timing of a sampling cycle advanced by one-half is 
outputted; if connection type 2a and 2b, the timing 

25 of a sampling cycle delayed by one-half is output) 
is computed. 

At step S5, the top sampled value computed at 
step S4 and the output timing of the preceding and 
following interpolated values computed in step S4 

30 are outputted to D/A converter 6. 

In other words, it is interpolated according to 
the four connection types shown in Fig. 5 whether 
the period between the end sampled value of the 
preceding pitch wave segment and the top sam- 

35 pled value of the current pitch wave segment is 
expanded or compressed by one-half sampling cy- 
cle, and then D/A converted. 

At step S6 f the next encoded data (second 
encoded datum) is read from data ROM 3. 

40 At step S7, the next encoded datum is de- 

coded according to the encoding method, and the 
next sampled value is computed. Then, the interpo- 
lated value of the period to the next sampled value 
is computed. The computed sampled value and the 

45 interpolated value are outputted to D/A convertor 6 
at the normal timing (specifically, the normal sam- 
pling point). 

At step S8. the sample counter is decremented 
by 1, and it is determined based on this value 

so whether the processing of the encoded data of the 
current pitch wave segment has been completed or 
not. If the result is that all processing has been 
completed, the flow advances to step S9; if not, the 
flow returns to step S6; and in both cases process- 

55 ing of the next encoded data is executed. 

At step S9, the repeat connection type of the 
preceding pitch wave segment data set at the 
repeat connection type for pitch wave segments in 
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RAM 2 is reset to the repeat connection type of the 
current pitch wave segment data set in the repeat 
connection type in RAM 2. 

At step S10, the repeat counter in RAM 2 is 
decremented by 1 , and it is determined based on 
this value whether ait repetitions of the current 
pitch wave segment are completed or not. If the 
result is completion, the flow advances to step S1 1 ; 
if not, the flow returns to step S3, the first encoded 
data of the current pitch wave segment is again 
inputted, and repeat processing is executed. 

At step Si 1 , the next pitch wave segment 
connection type of the preceding pitch wave seg- 
ment data set in the next pitch wave segment 
connection type for pitch wave segments in RAM 2 
is reset to the next pitch wave segment connection 
type of the current pitch wave segment data set in 
the next pitch wave segment connection type of 
RAM 2. 

At step S12. the end segment data flag in RAM 
2 is referenced to determine whether the current 
pitch wave segment is the end segment. If the 
result is "yes", the voice synthesis operation is 
completed; if "no", the flow returns to step SI, the 
next pitch wave segment data is read, and pro- 
cessing of the next pitch wave segment data be- 
gins. 

Thus, wave segment connection types are 
categorized by the combination of the connections 
of the pitch wave segments of differing wave types. 
Based on the connection type, the period between 
the end sampling point and the leading sampling 
point of connected pitch wave segments may be 
compressed or expanded by one-half of the normal 
sampling period, or the normal sampling period 
may be used to connect the wave segments. 
Therefore, pitch wave segments can be connected 
smoothly by a simple operation without producing 
any phase shift in the connection of the pitch wave 
segments. In other words, in a voice synthesizing 
device according to the present invention, distortion 
does not occur in the rise of the pitch wave seg- 
ment and sound quality deterioration is not pro- 
duced. 

In the foregoing preferred embodiment as de- 
scribed above, a pitch wave segment is used as 
the wave segment, but the present invention shall 
not be so limited, and a voice wave segment 
conforming to a pitch wave segment may also be 
used. 

As will be known from the foregoing description 
of the present invention, no phase shift in the 
connection of wave segments occur in the syn- 
thesized voice generated by the voice synthesizing 
device according to the present invention because 
such voice synthesizing device is provided with the 
wave segment connector which stores a connection 
type which expresses the type of connection be- 



tween the wave segments in the voice in a connec- 
tion type memory, and when said wave segments 
are connected to synthesize a voice, the end and 
leading sampling points of said wave segments are 

5 connected by a normal sampling period or by a 
sampling period compressed or expanded by one- 
half period depending upon the connection type 
stored in the connection type memory. 

As a result, the period between pitch wave 

w segments can be interpolated and the segments 
smoothly connected by a simple operation. There- 
fore, according to the present invention, voice syn- 
thesis free of distortion in the rise of connected 
wave segments and with no deterioration of sound 

is quality can be achieved. 

Although the present invention has been fully 
described in connection with the preferred embodi- 
ments thereof with reference to the accompanying 
drawings, it is to be noted that various changes and 

20 modifications are apparent to those skilled in the 
art. Such changes and modifications are to be 
construed as included within the scope of the 
present invention defined by the appended claims, 
unless they depart therefrom. 

25 

Claims 

1. A voice synthesizing device which compiles 
30 wave segments such as pitch wave segments in 
speech to synthesize speech, which device com- 
prises: 

a connection type memory for storing a connection 
type expressing the connection state of each wave 

35 segment for that point of said wave segment which 
connects with another wave segment; and 
a wave segment connector which, when said wave 
segments are connected, connects an end sam- 
pling point and a lead sampling point of the wave 

40 segment with a normal sampling period, or with a 
normal sampling period compressed or expanded 
by only 1/2 of the sampling period according to the 
connection type stored in said connection type 
memory. 

45 
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© Voice synthesizing device. 

© A voice synthesizing device which compiles wave seg- 
ments such as pitch wave segments in speech to synthesize 
speech, which device comprises a connection type memory for 
storing a connection type expressing the connection state of 
each wave segment for that point of the wave segment which 
connects with another wave segment; and a wave segment 
connector which, when the wave segments are connected, 
connects an end sampling point and a lead sampling point of 
the wave segment with a normal sampling period, or with a 
normal sampling period compressed or expanded by only 1/2 of 
the sampling period according to the connection type stored in 
the connection type memory. 
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